Description

Background and Context

Businesses like banks that provide service have to worry about the problem of 'Churn' i.e. customers leaving and joining another service provider. It is important to understand which aspects of the service influence a customer's decision in this regard. Management can concentrate efforts on the improvement of service, keeping in mind these priorities.

Objective

Given a Bank customer, build a neural network-based classifier that can determine whether they will leave or not in the next 6 months.

Data Description

The case study is from an open-source dataset from Kaggle. The dataset contains 10,000 sample points with 14 distinct features such as CustomerId, CreditScore, Geography, Gender, Age, Tenure, Balance, etc.

Data Dictionary

                * 0=No ( Customer did not leave the bank )
                * 1=Yes ( Customer left the bank )

Reading Dataset and Feature Elimination

Perform an Exploratory Data Analysis on the data

Univariate Analysis

Bivariate Analysis

Bivariate analysis

Illustrate the insights based on EDA

-Key meaningful observations from Bivariate analysis

Data Pre-processing

Splitting the dataset

Model building

Model 1

1) Imbalanced dataset: As you have seen in the EDA, this dataset is imbalanced, and it contains more examples that belong to the 0 class.

2) Decision Threshold: Due to the imbalanced dataset, we can use ROC-AUC to find the optimal threshold and use the same for prediction.

Let's try to change the optimizer, tune the decision threshold, increase the layers and configure some other hyperparameters accordingly, in order to improve the model's performance.

Model Performance Improvement

-Comment on which metric is right for model performance evaluation and why? - Find the optimal threshold using ROC-AUC curves - Comment on model performance - Can model performance be improved? check and comment - Build another model to implement these improvements - Include all the model which were trained to reach at the final one

Model 2

Model 3

Model 4

Model 5

Random Search CV

Accuracy, f1 score has decreased little however the ROC curve is smooth between train and validation

Model 6

Grid Search CV

0.8565104206403097 best accuracy so far with 5 layers

Model - Keras Tuner based recommendation

Model Performance Evaluation

Optimal Model

Conclusion and key takeaways

Thank you - Amogh